我们研究视觉变压器(VIT)的半监督学习(SSL),尽管VIT架构广泛采用了不同的任务,但视觉变形金刚(VIT)还是一个不足的主题。为了解决这个问题,我们提出了一条新的SSL管道,该管道由第一个联合国/自制的预训练组成,然后是监督的微调,最后是半监督的微调。在半监督的微调阶段,我们采用指数的移动平均线(EMA) - 教师框架,而不是流行的FixMatch,因为前者更稳定,并且为半手不见的视觉变压器提供了更高的准确性。此外,我们提出了一种概率的伪混合机制来插入未标记的样品及其伪标签以改善正则化,这对于训练电感偏差较弱的训练VIT很重要。我们所提出的方法被称为半vit,比半监督分类设置中的CNN对应物获得可比性或更好的性能。半vit还享受VIT的可伸缩性优势,可以很容易地扩展到具有越来越高的精度的大型模型。例如,半效率总数仅使用1%标签在Imagenet上获得令人印象深刻的80%TOP-1精度,使用100%ImageNet标签与Inception-V4相当。
translated by 谷歌翻译
我们提出了MACLR,这是一种新颖的方法,可显式执行从视觉和运动方式中学习的跨模式自我监督的视频表示。与以前的视频表示学习方法相比,主要关注学习运动线索的研究方法是隐含的RGB输入,MACLR丰富了RGB视频片段的标准对比度学习目标,具有运动途径和视觉途径之间的跨模式学习目标。我们表明,使用我们的MACLR方法学到的表示形式更多地关注前景运动区域,因此可以更好地推广到下游任务。为了证明这一点,我们在五个数据集上评估了MACLR,以进行动作识别和动作检测,并在所有数据集上展示最先进的自我监督性能。此外,我们表明MACLR表示可以像在UCF101和HMDB51行动识别的全面监督下所学的表示一样有效,甚至超过了对Vidsitu和SSV2的行动识别的监督表示,以及对AVA的动作检测。
translated by 谷歌翻译
我们提出了块茎:一种简单的时空视频动作检测解决方案。与依赖于离线演员检测器或手工设计的演员位置假设的现有方法不同,我们建议通过同时执行动作定位和识别从单个表示来直接检测视频中的动作微管。块茎学习一组管芯查询,并利用微调模块来模拟视频剪辑的动态时空性质,其有效地加强了与在时空空间中的演员位置假设相比的模型容量。对于包含过渡状态或场景变更的视频,我们提出了一种上下文意识的分类头来利用短期和长期上下文来加强行动分类,以及用于检测精确的时间动作程度的动作开关回归头。块茎直接产生具有可变长度的动作管,甚至对长视频剪辑保持良好的结果。块茎在常用的动作检测数据集AVA,UCF101-24和JHMDB51-21上优于先前的最先进。
translated by 谷歌翻译
An Anomaly Detection (AD) System for Self-diagnosis has been developed for Multiphase Flow Meter (MPFM). The system relies on machine learning algorithms for time series forecasting, historical data have been used to train a model and to predict the behavior of a sensor and, thus, to detect anomalies.
translated by 谷歌翻译
Data-driven soft sensors are extensively used in industrial and chemical processes to predict hard-to-measure process variables whose real value is difficult to track during routine operations. The regression models used by these sensors often require a large number of labeled examples, yet obtaining the label information can be very expensive given the high time and cost required by quality inspections. In this context, active learning methods can be highly beneficial as they can suggest the most informative labels to query. However, most of the active learning strategies proposed for regression focus on the offline setting. In this work, we adapt some of these approaches to the stream-based scenario and show how they can be used to select the most informative data points. We also demonstrate how to use a semi-supervised architecture based on orthogonal autoencoders to learn salient features in a lower dimensional space. The Tennessee Eastman Process is used to compare the predictive performance of the proposed approaches.
translated by 谷歌翻译
Anomaly Detection is a relevant problem that arises in numerous real-world applications, especially when dealing with images. However, there has been little research for this task in the Continual Learning setting. In this work, we introduce a novel approach called SCALE (SCALing is Enough) to perform Compressed Replay in a framework for Anomaly Detection in Continual Learning setting. The proposed technique scales and compresses the original images using a Super Resolution model which, to the best of our knowledge, is studied for the first time in the Continual Learning setting. SCALE can achieve a high level of compression while maintaining a high level of image reconstruction quality. In conjunction with other Anomaly Detection approaches, it can achieve optimal results. To validate the proposed approach, we use a real-world dataset of images with pixel-based anomalies, with the scope to provide a reliable benchmark for Anomaly Detection in the context of Continual Learning, serving as a foundation for further advancements in the field.
translated by 谷歌翻译
In this paper we present TruFor, a forensic framework that can be applied to a large variety of image manipulation methods, from classic cheapfakes to more recent manipulations based on deep learning. We rely on the extraction of both high-level and low-level traces through a transformer-based fusion architecture that combines the RGB image and a learned noise-sensitive fingerprint. The latter learns to embed the artifacts related to the camera internal and external processing by training only on real data in a self-supervised manner. Forgeries are detected as deviations from the expected regular pattern that characterizes each pristine image. Looking for anomalies makes the approach able to robustly detect a variety of local manipulations, ensuring generalization. In addition to a pixel-level localization map and a whole-image integrity score, our approach outputs a reliability map that highlights areas where localization predictions may be error-prone. This is particularly important in forensic applications in order to reduce false alarms and allow for a large scale analysis. Extensive experiments on several datasets show that our method is able to reliably detect and localize both cheapfakes and deepfakes manipulations outperforming state-of-the-art works. Code will be publicly available at https://grip-unina.github.io/TruFor/
translated by 谷歌翻译
A systematic review on machine-learning strategies for improving generalizability (cross-subjects and cross-sessions) electroencephalography (EEG) based in emotion classification was realized. In this context, the non-stationarity of EEG signals is a critical issue and can lead to the Dataset Shift problem. Several architectures and methods have been proposed to address this issue, mainly based on transfer learning methods. 418 papers were retrieved from the Scopus, IEEE Xplore and PubMed databases through a search query focusing on modern machine learning techniques for generalization in EEG-based emotion assessment. Among these papers, 75 were found eligible based on their relevance to the problem. Studies lacking a specific cross-subject and cross-session validation strategy and making use of other biosignals as support were excluded. On the basis of the selected papers' analysis, a taxonomy of the studies employing Machine Learning (ML) methods was proposed, together with a brief discussion on the different ML approaches involved. The studies with the best results in terms of average classification accuracy were identified, supporting that transfer learning methods seem to perform better than other approaches. A discussion is proposed on the impact of (i) the emotion theoretical models and (ii) psychological screening of the experimental sample on the classifier performances.
translated by 谷歌翻译
Recent work has reported that AI classifiers trained on audio recordings can accurately predict severe acute respiratory syndrome coronavirus 2 (SARSCoV2) infection status. Here, we undertake a large scale study of audio-based deep learning classifiers, as part of the UK governments pandemic response. We collect and analyse a dataset of audio recordings from 67,842 individuals with linked metadata, including reverse transcription polymerase chain reaction (PCR) test outcomes, of whom 23,514 tested positive for SARS CoV 2. Subjects were recruited via the UK governments National Health Service Test-and-Trace programme and the REal-time Assessment of Community Transmission (REACT) randomised surveillance survey. In an unadjusted analysis of our dataset AI classifiers predict SARS-CoV-2 infection status with high accuracy (Receiver Operating Characteristic Area Under the Curve (ROCAUC) 0.846 [0.838, 0.854]) consistent with the findings of previous studies. However, after matching on measured confounders, such as age, gender, and self reported symptoms, our classifiers performance is much weaker (ROC-AUC 0.619 [0.594, 0.644]). Upon quantifying the utility of audio based classifiers in practical settings, we find them to be outperformed by simple predictive scores based on user reported symptoms.
translated by 谷歌翻译
Since early in the coronavirus disease 2019 (COVID-19) pandemic, there has been interest in using artificial intelligence methods to predict COVID-19 infection status based on vocal audio signals, for example cough recordings. However, existing studies have limitations in terms of data collection and of the assessment of the performances of the proposed predictive models. This paper rigorously assesses state-of-the-art machine learning techniques used to predict COVID-19 infection status based on vocal audio signals, using a dataset collected by the UK Health Security Agency. This dataset includes acoustic recordings and extensive study participant meta-data. We provide guidelines on testing the performance of methods to classify COVID-19 infection status based on acoustic features and we discuss how these can be extended more generally to the development and assessment of predictive methods based on public health datasets.
translated by 谷歌翻译